Building the PDT-Vallex valency lexicon
نویسنده
چکیده
In our contribution, we relate the development of a richly annotated corpus and a computational valency lexicon. Our valency lexicon, called PDT-Vallex (Hajič et al., 2003) has been created as a “byproduct” of the annotation of the Prague Dependency Treebank (PDT) but it became an important resource for further linguistic research as well as for computational processing of the Czech language. We will present a description of the verbal part of this lexicon (more than 5300 verbs with 8200 valency frames) that has been built on the basis of the PDT corpus. Rigorous approach and the linking of each verb occurrence to the valency lexicon has made it possible to verify and refine the very notion of valency as introduced in the Functional Generative Description theory (Sgall et al., 1986; Panevová, 1974-5). Every occurrence of a verb in the corpus contains a reference to its valency frame (i.e., to an entry in the PDT-Vallex valency lexicon). The annotators insert the verbs (verb senses) found in the course of the annotation and their associated valency frames into the lexicon, adding an example (or more examples) of its usage (directly from the corpus). They also insert a note that refers to another verb that has one of its valency frames related to the current one (a synonym/antonym, an aspectual counterpart, etc.). A functor as well as its surface realization is recorded in every slot of each valency frame. The mapping between the valency frame and its surface realization is generally quite complex (Hajič and Urešová, 2003). The surface realizations through the morphemic case, preposition and a case, and subordinate sentence are the most common. The valency frame is fully formalized to allow for automatic computerized processing of the valency dictionary entries. Verb complementations are marked for obligatoriness, and their surface realization is attached. The realization of inner participants (arguments) is always given in full, since there is no “standard” or “default” realization; free modifications’ (adjuncts’) realization need not be specified. The PDT-Vallex is available as part of the PDT version 2 published by the Linguistic Data Consortium (http://www.ldc.upenn.edu, LDC2006T01).
منابع مشابه
Valency in the Prague Dependency Treebank: Building the Valency Lexicon
In this article we focus on valency, which belongs to the core phenomena being captured in the underlying level of the Prague Dependency Treebank (PDT). We present a summary of the basic principles of the applied theoretical framework including proposals for suitable refinement relevant to NLP. The current status of description of valency behavior of verbs, nouns and adjectives is outlined. We ...
متن کاملBuilding a Bilingual ValLex Using Treebank Token Alignment: First Observations
In this paper we explore the potential and limitations of a concept of building a bilingual valency lexicon based on the alignment of nodes in a parallel treebank. Our aim is to build an electronic Czech↔English Valency Lexicon by collecting equivalences from bilingual treebank data and storing them in two already existing electronic valency lexicons, PDT-VALLEX and Engvallex. For this task a s...
متن کاملAdvanced Searching in the Valency Lexicons Using PML-TQ Search Engine
This paper presents a sophisticated way to search valency lexicons. We provide a visualization of lexicons with such built-in searching that allows users to draw sophisticated queries in a graphical mode. We exploit the PMLTQ, a query language based on the tree editor TrEd. For demonstration purposes, we focus on VALLEX and PDT-VALLEX, two Czech valency lexicons of verbs. We propose a common le...
متن کاملNominal Valency in Lexicons
The term valency refers to the number, type and form of arguments that are bound to a word. Valency is specific to any given lexical unit and therefore is covered by lexicons. This is a preliminary survey conducted with the creation of a valency lexicon of Czech nouns in mind. The authors of such a lexicon have to decide who will be the intended users, how the material will be presented and whi...
متن کاملCzech-English Bilingual Valency Lexicon Online
We describe CzEngVallex, a bilingual Czech–English valency lexicon which aligns verbal valency frames and their arguments. It is based on a parallel Czech-English corpus, the Prague Czech-English Dependency Treebank (PCEDT), where for each occurrence of a verb, a reference to the underlying Czech and English valency lexicons (PDT-Vallex and CzEngVallex, respectively) is recorded. The CzEngValle...
متن کامل